7 research outputs found

    Design of an AXI-SDRAM interface IP in a RISC-V processor

    Get PDF
    PreDRAC is a RISC-V based SoC developed with the collaboration of the BSC, CIC-IPN, IMB-CNM (CSIC) and UPC. On its first version, sent to fabricate on May 2019, it used a custom interface to access main memory through an FPGA. Access to memory is critical to the performance of a processor and a AXI-SDRAM interface IP to be integrated into a future revision of the chip has been designed. No specific area, power or performance constraints are defined for AXI-SDRAM interface as the first step is to obtain a functional design with the required verification setup to ensure its proper operation once fabricated on silicon. The design of the IP covers different aspects in the ASIC design flow: the initial RTL implementation, synthesis, verification at RTL and gate-level simulations and a final power analysis. Final results show that this IP can successfully be integrated with the preDRAC SoC, replacing the custom interface, and obtaining better performance. However, the AXI-SDRAM interface IP can be further improved both in terms of performance and power

    Comparativa d'implementació hardware vs software en un sistema de processat d'imatge per a detecció de cares

    Get PDF
    El processat d'imatge és un dels camps en el que l'ús de hardware específic és en principi clarament avantatjós. El projecte pretén avaluar aquest teòric avantatge comparant dues implementacions, en software i en hardware, d'un sistema de detecció de cares basat en la llibreria openCV. El desenvolupament es realitzarà en la placa Zedboard, i el dispositiu Zynq que permet fer ambdós tipus d'implementacions.In this project two implementations of a face detection system, one of them based on hardware and the other on software, have been developed. The main goal is to compare both implementations and determine the advantages or disadvantages of using a dedicated hardware. The detection algorithm that has been used is a Haar-like features classifier based on the method proposed by Viola and Jones to accelerate object detection processing. The original code of the algorithm and the classifier that has been used were obtained from the OpenCV library. Vivado HLS has been used in order to synthetize the C/C++ code to generate an RTL description and implement the hardware version of the system. The project has been developed in different stages. At first the original code was analysed in order to obtain a simplified version which could be used to develop both the hardware and software implementations. Finally the results of those implementations were compared, analysing also the efficiency of Vivado HLS as a development tool.Este trabajo consiste en la realización de dos implementaciones distintas para un mismo sistema de detección de caras, una basada en hardware i la otra en software, con el objetivo de compararlas i determinar los beneficios o inconvenientes de usar un hardware dedicado. El algoritmo de detección que se ha usado consiste en un clasificador de características pseudo-Haar basado en el método propuesto por Viola y Jones para acelerar el proceso de detección de objetos. Tanto el código original del algoritmo como los datos del clasificador utilizados se han obtenido de la librería OpenCV. También se ha utilizado Vivado HLS, que permite sintetizar código en C/C++ para generar directamente una descripción RTL, para realizar la implementación en hardware. El proyecto se ha llevado a cabo en distintas etapas. En la primera parte se ha analizado un análisis i simplificación del código para, a continuación, realizar las distintas implementaciones. Finalmente se han comparado los resultados obtenidos en la versión hardware y software al mismo tiempo que se ha analizado la eficiencia de Vivado HLS como herramienta de desarrollo.Aquest treball consisteix en la realització de dues implementacions d'un mateix sistema de detecció de cares, una basada en hardware i l'altra en software, amb l'objectiu de comparar-les i avaluar els beneficis o inconvenients d'utilitzar un hardware dedicat. L'algoritme de detecció que s'ha utilitzat és un classificador de característiques pseudo-Haar basat en el mètode proposat per Viola i Jones per accelerar el procés de detecció d'objectes. Tant el codi original de l'algoritme com les dades del classificador utilitzades s'han obtingut de la llibreria OpenCV. També s'ha utilitzat Vivado HLS, que permet sintetitzar codi en C/C++ per generar directament una descripció RTL, per realitzar la implementació en hardware. El projecte s'ha dut a terme en diferents etapes. La primera part consisteix en una anàlisi i simplificació del codi per a continuació realitzar les diferents implementacions. Finalment s'ha realitzat la comparativa entre la versió en hardware i software conjuntament amb una anàlisi de l'eficiència de Vivado HLS com a eina de desenvolupament

    Vitruvius+: An area-efficient RISC-V decoupled vector coprocessor for high performance computing applications

    Get PDF
    The maturity level of RISC-V and the availability of domain-specific instruction set extensions, like vector processing, make RISC-V a good candidate for supporting the integration of specialized hardware in processor cores for the High Performance Computing (HPC) application domain. In this article,1 we present Vitruvius+, the vector processing acceleration engine that represents the core of vector instruction execution in the HPC challenge that comes within the EuroHPC initiative. It implements the RISC-V vector extension (RVV) 0.7.1 and can be easily connected to a scalar core using the Open Vector Interface standard. Vitruvius+ natively supports long vectors: 256 double precision floating-point elements in a single vector register. It is composed of a set of identical vector pipelines (lanes), each containing a slice of the Vector Register File and functional units (one integer, one floating point). The vector instruction execution scheme is hybrid in-order/out-of-order and is supported by register renaming and arithmetic/memory instruction decoupling. On a stand-alone synthesis, Vitruvius+ reaches a maximum frequency of 1.4 GHz in typical conditions (TT/0.80V/25°C) using GlobalFoundries 22FDX FD-SOI. The silicon implementation has a total area of 1.3 mm2 and maximum estimated power of ~920 mW for one instance of Vitruvius+ equipped with eight vector lanes.This research has received funding from the European High Performance Computing Joint Undertaking (JU) under Framework Partnership Agreement No 800928 (European Processor Initiative) and Specific Grant Agreement No 101036168 (EPI SGA2). The JU receives support from the European Union’s Horizon 2020 research and innovation programme and from Croatia, France, Germany, Greece, Italy, Netherlands, Portugal, Spain, Sweden, and Switzerland. The EPI-SGA2 project, PCI2022-132935 is also co-funded by MCIN/AEI/10.13039/501100011033 and by the UE NextGen- erationEU/PRTR. This work has also been partially supported by the Spanish Ministry of Science and Innovation (PID2019-107255GB-C21/AEI/10.13039/501100011033).Peer ReviewedPostprint (author's final draft

    DVINO: A RISC-V vector processor implemented in 65nm technology

    Get PDF
    This paper describes the design, verification, implementation and fabrication of the Drac Vector IN-Order (DVINO) processor, a RISC-V vector processor capable of booting Linux jointly developed by BSC, CIC-IPN, IMB-CNM (CSIC), and UPC. The DVINO processor includes an internally developed two-lane vector processor unit as well as a Phase Locked Loop (PLL) and an Analog-to-Digital Converter (ADC). The paper summarizes the design from architectural as well as logic synthesis and physical design in CMOS 65nm technology.The DRAC project is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total eligible cost. The authors are part of RedRISCV which promotes activities around open hardware. The Lagarto Project is supported by the Research and Graduate Secretary (SIP) of the Instituto Politecnico Nacional (IPN) from Mexico, and by the CONACyT scholarship for Center for Research in Computing (CIC-IPN).Peer ReviewedArticle signat per 43 autors/es: Guillem Cabo∗, Gerard Candón∗, Xavier Carril∗, Max Doblas∗, Marc Domínguez∗, Alberto González∗, Cesar Hernández†, Víctor Jiménez∗, Vatistas Kostalampros∗, Rubén Langarita∗, Neiel Leyva†, Guillem López-Paradís∗, Jonnatan Mendoza∗, Francesco Minervini∗, Julian Pavón∗, Cristobal Ramírez∗, Narcís Rodas∗, Enrico Reggiani∗, Mario Rodríguez∗, Carlos Rojas∗, Abraham Ruiz∗, Víctor Soria∗, Alejandro Suanes‡, Iván Vargas∗, Roger Figueras∗, Pau Fontova∗, Joan Marimon∗, Víctor Montabes∗, Adrián Cristal∗, Carles Hernández∗, Ricardo Martínez‡, Miquel Moretó∗§, Francesc Moll∗§, Oscar Palomar∗§, Marco A. Ramírez†, Antonio Rubio§, Jordi Sacristán‡, Francesc Serra-Graells‡, Nehir Sonmez∗, Lluís Terés‡, Osman Unsal∗, Mateo Valero∗§, Luís Villa† // ∗Barcelona Supercomputing Center (BSC), Barcelona, Spain. Email: [email protected]; †Centro de Investigación en Computación, Instituto Politécnico Nacional (CIC-IPN), Mexico City, Mexico; ‡ Institut de Microelectronica de Barcelona, IMB-CNM (CSIC), Spain. Email: [email protected]; §Universitat Politecnica de Catalunya (UPC), Barcelona, Spain. Email: [email protected] (author's final draft

    Comparativa d'implementació hardware vs software en un sistema de processat d'imatge per a detecció de cares

    No full text
    El processat d'imatge és un dels camps en el que l'ús de hardware específic és en principi clarament avantatjós. El projecte pretén avaluar aquest teòric avantatge comparant dues implementacions, en software i en hardware, d'un sistema de detecció de cares basat en la llibreria openCV. El desenvolupament es realitzarà en la placa Zedboard, i el dispositiu Zynq que permet fer ambdós tipus d'implementacions.In this project two implementations of a face detection system, one of them based on hardware and the other on software, have been developed. The main goal is to compare both implementations and determine the advantages or disadvantages of using a dedicated hardware. The detection algorithm that has been used is a Haar-like features classifier based on the method proposed by Viola and Jones to accelerate object detection processing. The original code of the algorithm and the classifier that has been used were obtained from the OpenCV library. Vivado HLS has been used in order to synthetize the C/C++ code to generate an RTL description and implement the hardware version of the system. The project has been developed in different stages. At first the original code was analysed in order to obtain a simplified version which could be used to develop both the hardware and software implementations. Finally the results of those implementations were compared, analysing also the efficiency of Vivado HLS as a development tool.Este trabajo consiste en la realización de dos implementaciones distintas para un mismo sistema de detección de caras, una basada en hardware i la otra en software, con el objetivo de compararlas i determinar los beneficios o inconvenientes de usar un hardware dedicado. El algoritmo de detección que se ha usado consiste en un clasificador de características pseudo-Haar basado en el método propuesto por Viola y Jones para acelerar el proceso de detección de objetos. Tanto el código original del algoritmo como los datos del clasificador utilizados se han obtenido de la librería OpenCV. También se ha utilizado Vivado HLS, que permite sintetizar código en C/C++ para generar directamente una descripción RTL, para realizar la implementación en hardware. El proyecto se ha llevado a cabo en distintas etapas. En la primera parte se ha analizado un análisis i simplificación del código para, a continuación, realizar las distintas implementaciones. Finalmente se han comparado los resultados obtenidos en la versión hardware y software al mismo tiempo que se ha analizado la eficiencia de Vivado HLS como herramienta de desarrollo.Aquest treball consisteix en la realització de dues implementacions d'un mateix sistema de detecció de cares, una basada en hardware i l'altra en software, amb l'objectiu de comparar-les i avaluar els beneficis o inconvenients d'utilitzar un hardware dedicat. L'algoritme de detecció que s'ha utilitzat és un classificador de característiques pseudo-Haar basat en el mètode proposat per Viola i Jones per accelerar el procés de detecció d'objectes. Tant el codi original de l'algoritme com les dades del classificador utilitzades s'han obtingut de la llibreria OpenCV. També s'ha utilitzat Vivado HLS, que permet sintetitzar codi en C/C++ per generar directament una descripció RTL, per realitzar la implementació en hardware. El projecte s'ha dut a terme en diferents etapes. La primera part consisteix en una anàlisi i simplificació del codi per a continuació realitzar les diferents implementacions. Finalment s'ha realitzat la comparativa entre la versió en hardware i software conjuntament amb una anàlisi de l'eficiència de Vivado HLS com a eina de desenvolupament

    VIA: A smart scratchpad for vector units with application to sparse matrix computations

    Get PDF
    Sparse matrix operations are critical kernels in multiple application domains such as High Performance Computing, artificial intelligence and big data. Vector processing is widely used to improve performance on mathematical kernels with dense matrices. Unfortunately, existing vector architectures do not cope well with sparse matrix computations, achieving much lower performance in comparison with their dense counterparts.To overcome this limitation, we present the Vector Indexed Architecture (VIA), a novel hardware vector architecture that accelerates applications with irregular memory access patterns such as sparse matrix computations. There are two main bottlenecks when computing with sparse matrices: irregular memory accesses and index matching. VIA addresses these two bottlenecks with a smart scratchpad that is tightly coupled to the Vector Functional Units within the core.Thanks to this structure, VIA improves locality for sparse-dense computations and improves the index matching search process for sparse computations. As a result, VIA achieves significant performance speedup over highly optimized state-of-the-art C++ algebra libraries. On average, VIA outperforms sparse matrix vector multiplication, sparse matrix addition and sparse matrix matrix multiplication kernels by 4.22 ×, 6.14 × and 6.00 ×, respectively, when evaluated over a thousand sparse matrices that arise in real applications. In addition, we prove the generality of VIA by showing that it can accelerate histogram and stencil applications by 4.5 × and 3.5 ×, respectively.This work has been supported by the European HiPEAC Network of Excellence, by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), by the Generalitat de Catalunya (contracts 2017-SGR-1414 and 2017-SGR-1328), and by the DRAC project, which is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total eligible cost. A. Barredo has been supported by the Spanish Government under Formation del Personal Investigador fellowship number BES-2017- 080635. M. Moreto has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship number RYC-2016-21104.Peer ReviewedPostprint (author's final draft

    An academic RISC-V silicon implementation based on open-source components

    Get PDF
    ©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The design presented in this paper, called preDRAC, is a RISC-V general purpose processor capable of booting Linux jointly developed by BSC, CIC-IPN, IMB-CNM (CSIC), and UPC. The preDRAC processor is the first RISC-V processor designed and fabricated by a Spanish or Mexican academic institution, and will be the basis of future RISC-V designs jointly developed by these institutions. This paper summarizes the design tasks, for FPGA first and for SoC later, from high architectural level descriptions down to RTL and then going through logic synthesis and physical design to get the layout ready for its final tapeout in CMOS 65nm technology.The DRAC project is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total eligible cost. The authors are part of RedRISCV which promotes activities around open hardware. The Lagarto Project is supported by the Research and Graduate Secretary (SIP) of the Instituto Politecnico Nacional (IPN) ´ from Mexico, and by the CONACyT scholarship for Center for Research in Computing (CIC-IPN).Peer ReviewedPostprint (author's final draft
    corecore